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ABSTRACT 

Recently several studies have jointly analysed data from different cosmological probes 
with the motivation of estimating cosmological parameters. Here we generalise this 
procedure to take into account the relative weights of various probes. This is done by 
including in the joint function a set of 'Hyper-Parameters', which are dealt with 
using Bayesian considerations. The resulting algorithm (in the case of uniform priors 
on the log of the Hyper-Parameters) is very simple: instead of minimising ^ x^j (where 
Xj is per data set j) we propose to minimise ^ Nj In(x^) (where Nj is the number 
of data points per data set j). We illustrate the method by estimating the Hubble 
constant Hq from different sets of recent CMB experiments (including Saskatoon, 
Python V, MSAMl, TOCO and Boomerang). 
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1 INTRODUCTION 

Several groups (e.g. Eisenstein et al. 1999, Gawiser & Silk 
1998, Bridle et al. 1999, Bahcall et al. 1999, Bond & Jaffe 
1998, Lineweaver 1998) have recently estimated cosmologi- 
cal parameters by joint analysis of data sets (e.g. CMB, SNe 
la, redshift surveys, cluster abundance and peculiar veloci- 
ties). 

A complication that arises in combining data sets is 
that there is freedom in assigning the relative weights of dif- 
ferent measurements. Some approaches to this problem have 
been suggested in the astronomical literature (e.g. Godwin & 
Lynden-Bell 1987; Press 1996). Here we propose a Bayesian 
approach utilizing 'Hyper Parameters' (hereafter HPs). 

Assume that we have 2 independent data sets. Da and 
Db (with Na and Nb data points respectively) and that we 
wish to determine a vector of free parameters w (such as the 
density parameter fim, the Hubble constant Hq etc.). This 
is commonly done by minimising 

Xfoint = XA + XS . (1) 

(or maximizing the sum of log likelihood functions). Such 
procedures assume that the quoted observational random 
errors can be trusted, and that the two (or more) x^s have 



equal weights. However, when combining 'apples and or- 
anges' one may wish to allow freedom in the relative weights. 
One possible approach is to generalise Eq. |l| to be 



xfoint = aXA + /3 xl 



(2) 



where a and /3 are 'Lagrange multipliers', or 'Hyper- 
Parameters', which are to be evaluated in a Bayesian way. 
There are a number of ways to interpret the meaning of the 
HPs. A simple example of the HPs is the case that 



- a;prcd,i(w) 



(3) 



where the sum is over Na measurements and corresponding 
predictions and errors (Ji. Hence by multiplying by a each 
error effectively becomes a~^^'^ai. But even if the measure- 
ment errors are accurate, the HPs are useful in assessing the 
relative weight of different experiments. It is not uncommon 
that astronomers discard measurements (i.e. by assigning 
Q = 0) in an ad-hoc way. The procedure we propose gives 
an objective diagnostic as to which measurements are prob- 
lematic and deserve further understanding of systematic or 
random errors. 
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We show below that if the prior probabihties for ln(a) 
and ln(/3) are uniform then one should consider the quantity 



Pc{Da\^) oc exp[-xy2] 



(14) 



2 \nP{w\DA,DB) ^ NA\n{x\) + NbHxI) 



(4) 

instead of Eq. hi It is as easy to calculate this statistic as 
the standard x ■ The effective HPs can then be identified as 
«eff = ^ and /3eff = where x3i and Xs are computed 

at the values of the parameters w that minimise eq. ]A The 
derivation and interpretation of these results are given in 
Section 2. In Section 3 we apply the method to a set of 
CMB experiments and estimate the best fit Hubble constant. 
Extensions of the methods are discussed in Section 4. 



2 'HYPER-PARAMETERS' 

How do we eliminate the unknown HPs ct and /3 ? We follow 
here the Bayesian formalism given in Gull (1989), MacKay 
(1994) and Bishop (1995). The formalism in these references 
was given in the context of Maximum Entropy and Artificial 
Neural Networks. 

By marginalisation over a and {3 we can write the prob- 
ability for the parameters w given the data: 



P{^\Da,Db) 



P{w,a,l3\DA,DB) da d(3 



(5) 



Using Bayes' theorem we can write the following relations 
P{DA,DB\^,a,p) P{w,a,IJ) 



P{w,a,f3\DA,DB) = 
and 



P{Da,Db) 



P(w,a,/3) = P(w|a,/3) P(a,/3). 

We now make the following assumptions: 
P{DA,DB\^^,a,f3) =P{DA\w,a) P{Db\^,P) 
P(w|a,/3) = const. , 
P{a,f3)=P{a) P{f3). 



(6) 

(7) 

(8) 
(9) 
(10) 



With the choice of 'non-informative' uniform priors in the 
log, P(lna) = P(ln/3) = 1 (Jeffreys 1939) we get P(a) = 
1/a and P(/3) = 1//3. Note that the integral over priors 
of this kind diverges (such a prior is called 'improper', see 
Bishop 1995). These are very conservative prior, essentially 
stating that we are ignorant about the scale of measurements 
and errors. The other extreme is obviously P{ct) = 5{oi — 1), 
i.e. when the measurements and errors are taken faithfully. 
One can try other forms (see below), but it is likely that 
these 2 extreme forms reasonably bracket the probability 
space. Hence: 



P{w\Da,Db) 



where 



P(Da.Db, 



P{Da\^) P{Db\w), 



(11) 



(12) 



(13) 



It is common to have a likelihood function of the form 
of a Gaussian in Na dimensions: 



P{Da\-w) = J P{DA\w,a)a'^da 
and 

P(Db\w)= I P{DB\w,P)r'dl3 



where we assume for simplicity that the normalization con- 
stant is independent of the parameters w (this is indeed the 
case in our application for the CMB measurements in the 
next Section). 

We generalise this form to incorporate a as follows: 



/ Q 2 ^ 
exp(--XA) 



P(P)A|w,a) cx Q^-*/^ 
The integral of Eq. |l^ then gives 
P(P'a|w) (X (XA 



-Na 



(15) 



(16) 



and similarly for Eq. y_3[ We note that it is the specific choice 
of prior for P{a) = 1/a that has led to a change from a 
Gaussian distribution (Eq. |lj ) to a power- law (Eq. |l|). 
Eq. ^ can then be written (ignoring constants) as 



2 In P(w\Da,Db) 



NaHxa) + NbHxb) 



(17) 



To find the best fit parameters w requires us to minimise 
the above probability in the w space. Eq. |l7| generalises a 
similar equation derived by Cash (1979). Cash used a very 
different set of arguments based on maximum likelihood and 
he assumed that the error per group of data is the same (in 
this special case the original quoted errors drop out in the 
minimisation). We emphasize that our Bayesian framework 
is more general and 'principled', and therefore we can derive 
alternative equations by assuming different priors. 

Since a and /3 have been eliminated from the analysis 
by marginalisation they do not have particular values that 
can be quoted. Rather, each value of a and /3 has been con- 
sidered, and weighted according to the probability of the 
data given the model. However, it may be useful to know 
which values of a and /3 were given the most weight. This 
can be estimated by finding the values of a and l3 at which 
eq ^ peaks: 



Na 

and similarly 



Pes = 



Nb 



(18) 



(19) 



both evaluated at the joint peak. 

We note that if we substitute these effective a and /3 
in Eq. ^ we obtain Xjoint = Na + Nb, i.e. a reduced x^ of 
unity (for the case when the number of degrees of freedom 
is dominated by the number of data points). 

There is of course freedom in choosing the prior. For 
example, if we take P{a) = 1 (instead of Jeffreys' prior 
P(q) = 1/a) we find that the function to be minimised is 
(A'^A + 2) In(xi), instead of Na In(xi). Thus these two priors 
give very similar results for large Na ■ Numerous other priors 
are possible (e.g. a top-hat centred on a plausible value) , but 
at the expense of more free HPs (e.g. the width of the top- 
hat). 



3 APPLICATION TO CMB DATA 

We illustrate the effect of using HPs by application to mea- 
surements of the angular power spectrum of the cosmic mi- 
crowave background (CMB). Numerous groups have now 
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Data 


J 


Best h 




ROOMER A NO /N A 


7 


0.75 


1.5 


TOCO 


9 


0.47 


10.5 


MSAMl 


3 


0.92 


2.1 


PythonV 


7 


1.00 


48.3 


Saskatoon 


5 


0.46 


0.5 


Other 


16 


0.65 


16.3 



X 10" 



Table 1. Conventional analysis using each data subset alone. 
For each data subset the number of data points, Nj, the best 
fit value of h and the value at this point is shown. The full 
likelihood distributions in h are shown in Figure ^. 



used CMB data to estimate cosmological parameters (see 
Rocha 1999 for a review) . The most common method is the 
flat bandpower method (Bond 1995a; Bond 1995b) in which 
the difference between observed and predicted flat bandpow- 
ers are compared using the statistic (Eq. ^). There is a 
question as to whether one should use all measurements from 
all groups of observers, independent of whether a given data 
set is consistent with the other data. Dodelson and Knox 
(1999) do address this issue by assigning a calibration coef- 
ficient to each data point, the values of which are optimised 
for each cosmological model investigated. The HPs method 
offers a Bayesian alternative to ad-hoc selection of data sets 
or the problems associated with using incompatible data sets 
and a conventional approach. 

There are clearly a large number of possible combina- 
tions of CMB data sets that could be investigated. For the 
purpose of illustration we divide a selection of the current 
CMB power spectrum estimates into six subsets. The sub- 
sets are (i) Saskatoon (Netterfield et al. 1997, including the 
five per cent calibration error; Leitch, private communica- 
tion), (ii) Python V (Coble et al. 1999), (iii) MSAMl (Wil- 
son et al 1999), (iv) TOCO (Torbet et al. 1999, Miller et 
al. 1999), (v) BOOMERANG/NA (Mauskopf et al. 1999, 
assuming Gaussian window functions which fall by a factor 
of 1/e at Imin and ^max as specified in the paper) and (vi) 
we group all of the remaining data into the fifth subset and 
refer to it as 'Other'. This subset contains COBE, Tenerife, 
South Pole, ARGO, MAX, QMAP, OVRO and CAT (see 
Hancock et al. 1998, Webster et al. 1998 and Efstathiou et 
al. 1999 for more details). These data are plotted in Fig. |l| 

In addition, for simplicity we restrict ourselves to a very 
limited set of cosmological models. We assume CMB fiuc- 
tuations arise from adiabatic initial conditions with cold 
dark matter and negligible tensor component, and that 
fim = 0.3, r^A = 1 — fim = 0.7, n = 1, Qrms-ps = IS^iK and 
Qbh^ = 0.019. We then investigate the constraints on the 
remaining parameter, the dimensionless Hubble constant, 
h = Ho/(100 kms"^Mpc"^). Theoretical power spectra for 
three different values of h are shown in Fig. |l|: increasing h 
decreases the height of the first acoustic peak, and makes 
few other significant changes for the purpose of our analy- 
sis. The range in h investigated here (0.3 < h < 1.0) takes 
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Figure 1. The CMB data used, shown in three panels for clarity. 
Theoretical CMB power spectra are shown for h = 0.47, h = 0.70 
and h = 1.00 (other parameters are fixed at the values given in 
the text). 



the peak height from above the Saskatoon upper error bars 
down to the MSAMl points. 

To aid qualitative understanding of the analysis that 
follows, it is helpful to first calculate the value of h pre- 
ferred by each data subset. The results are plotted in Fig. 
2 and shown in Table 1. As expected from the range of first 
acoustic peak heights preferred by the data, the h values 
also vary considerably. The BOOMERANG/NA data alone 
prefers an intermediate value of h, as does the 'Other' data. 
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Figure 2. The probability of the Hubble constant h as a function 
of h from different subsets of CMB data (as indicated in the 
legend) resulting from a conventional analysis. 



TOCO and Saskatoon both agree on a relatively high first 
acoustic peak and so on a low h. The MSAMl points are 
quite low and thus fit a high value of h, and the PythonV 
points also prefer a high value of h, although the value 
is very high, indicating a bad fit to the model. 

Clearly there are a large number of possible group- 
ings of the data subsets. We show here the results from 
just five groupings, which are a fair sample and also high- 
light some of the properties of HPs. Firstly we consider the 
case of two relatively discrepant data sets. Saskatoon and 
BOOMERANG/NA; the h values that they prefer do not 
overlap significantly. Combining their values for each h in 
the conventional manner (Eq. hi) yields the likelihood func- 
tion plotted with the dotted line in Fig. ^ (Top). An inter- 
mediate value of h is preferred, and in fact the best fitting 
h values for each data set alone are essentially ruled out. 
In contrast, when HPs are used, i.e. the values are com- 
bined using Eq. |l^, the dotted line in Fig. |^ (Bottom) is 
obtained. There are two peaks in the probability distribu- 
tion corresponding to the two different values of h preferred 
by each data set alone. This is perhaps closer to what we 
would actually believe given just these two data subsets. 

Next we consider the effect of adding in a data sub- 
set that agrees strongly with one of the above two data 
subsets. That is, we consider TOCO with Saskatoon and 
BOOMERANG/NA. The probability distribution calcu- 
lated using HPs now loses its second peak, retaining the 
one that agrees with TOCO and Saskatoon. The theoretical 
CMB power spectrum for the preferred value of ft = 0.47 is 
shown in Fig. y. 

On combining two data sets that do agree well. 



Data 


J 


Best h 




BOOM /NA-^Sask 


12 


0.58 


11.3 


BOOM /NA+TOCO-l-Sask 


21 
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25.6 


BOOM/NA+Othcr 


23 


0.70 


18.6 


BOOM /NA+PythonV-fOther 


30 


0.95 


81.2 


All data 


47 


0.68 


152.6 



Table 2. Conventional analysis. Best fit values of h and x^ 
values at this best fit point, which can be compared to the total 
number of data points, Nj . 
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Table 3. The five different combinations used in the Hyper- 
Parameter analysis. In each case a separate Hyper-Parameter was 
given to each data subset, the number of data points in each data 
subset, Nj, is shown. The best fitting value of h, the x^ value 
for each data subset for this best fitting h and the effective HP 
(Nj/x'j) at this h is calculated. 



BOOMERANG/NA and 'Other', there is little difference 
between the conventional and HP analyses, although the er- 
ror bar on h is slightly decreased when using HPs. 

Adding in a data subset that has a poor x^ (given the 
range of models considered), PythonV, makes a large dif- 
ference to the conventional analysis but only a very small 
difference to the HP analysis. This can also be seen from 
the effective HP value at the best fit h, calculated from Eq. 
[l^ , which is much less than unity for the PythonV data, 
indicating that it has been down weighted. 

Finally we use all of the data subsets, obtaining the solid 
lines in Fig. ^. It turns out that the best fitting value of h is 
similar in both the conventional and HP analyses, but the 
error bars are significantly wider in the HP analysis, which 
corresponds better to what we would naturally believe. 



© 0000 RAS, MNRAS 000, 000-000 



Joint estimation with 'hyper-parameters' 5 



BOOM/NA+Sask 

BOOM/NA+TOCO+Sask 

BOOM/NA+Other 

BOOM/NA+PythonV+Other 

BOOM/NA+TOCO+MSAM1+PythonV+Sask+Other 



Conventional X Analysis 



0.8 



-0.6 



0.4 



0.2 




0.3 



- 












' » M 










\ ' 








\ / 








\ 1 








\ 1 








\ 1 








\ 1 








\ 1 








\ ' 








\ ' 








\ \ ' 








I \ 1 








\ \ ' 






1 / / ': 


\ \ ' 






1 / / 








¥ / 


\ '\ 






// : 













0.4 



0.5 



0.8 



0.6 0.7 

h 

Hyper-Parameter Analysis 



0.9 




Figure 3. The probability of the Hubble constant /i as a func- 
tion of h from different subsets of CMB data (as indicated in 
the legend) resulting (Top figure) from a conventional analy- 
sis and (Bottom figure) the Hyper-Parameter analysis, in which 
P(h|D^, Db, . . .) = exp[-i ^iV, ln(x|)]. 



4 DISCUSSION 

We have presented a formalism for analyzing a set of dif- 
ferent measurements. By using a Bayesian analysis, and by 
using a 'non-informative' prior for the 'Hyper- Parameters', 
we find that for M data sets one should minimise 



2 InP(wldata) 



M 

E 



(20) 



where Nj is the number of measurements in data set j = 
1, M. It is as easy to calculate this statistic as the stan- 
dard x^- The corresponding HPs Qefi.j = Nj/Xj provide 
useful diagnostics on the reliability of different data sets. 
We emphasize that a low HP assigned to an experiment 



does not necessarily mean that the experiment is 'bad', but 
rather it calls attention to look for systematic effects or bet- 
ter modelleing. 

We have applied the HP analysis to a set of various 
CMB measurements and estimated the Hubble constant Ho 
(for a fixed flat CDM = 1 - A = 0.3 model). While the 
standard approach gives a wide range for Ho, the Hyper- 
Parameter analysis suggests two distinct values of Ho, ~ 50 
and ~ 70 km/sec/Mpc. It remains to be understood why 
the ensemble of CMB experiments tends to give two com- 
peting values (quite common in the history of the Hubble 
constant !). It would be most interesting to see how these val- 
ues change when combined with other cosmological probes 
(and their corresponding HPs). 

In estimating Ho we have assumed that the other cos- 
mological parameters (like f2m and JIa) were known and, 
consequently, such an estimate is conditional on the assumed 
values of these parameters. This is not strictly necessary, 
actually, since we may obtain a more general estimation 
of Ho by marginalising over Qrn, A, and the other cosmo- 
logical parameters, in a way similar to what we have done 
with the HPs. One may also generalise the above method for 
more specific applications. Two aspects which can be modi- 
fied according to specific problems are the priors P{aj) and 
the probability functions P{Dj\w). We shall discuss else- 
where these extensions in application to various cosmologi- 
cal probes. 
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